NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Humans in the loop: Community science and machine learning synergies for overcoming herbarium digitization bottlenecks

https://doi.org/10.1002/aps3.11560

Guralnick, Robert; LaFrance, Raphael; Denslow, Michael; Blickhan, Samantha; Bouslog, Mark; Miller, Sean; Yost, Jenn; Best, Jason; Paul, Deborah L; Ellwood, Elizabeth; et al (January 2024, Applications in Plant Sciences)

Abstract PremiseAmong the slowest steps in the digitization of natural history collections is converting imaged labels into digital text. We present here a working solution to overcome this long‐recognized efficiency bottleneck that leverages synergies between community science efforts and machine learning approaches. MethodsWe present two new semi‐automated services. The first detects and classifies typewritten, handwritten, or mixed labels from herbarium sheets. The second uses a workflow tuned for specimen labels to label text using optical character recognition (OCR). The label finder and classifier was built via humans‐in‐the‐loop processes that utilize the community science Notes from Nature platform to develop training and validation data sets to feed into a machine learning pipeline. ResultsOur results showcase a >93% success rate for finding and classifying main labels. The OCR pipeline optimizes pre‐processing, multiple OCR engines, and post‐processing steps, including an alignment approach borrowed from molecular systematics. This pipeline yields >4‐fold reductions in errors compared to off‐the‐shelf open‐source solutions. The OCR workflow also allows human validation using a custom Notes from Nature tool. DiscussionOur work showcases a usable set of tools for herbarium digitization including a custom‐built web application that is freely accessible. Further work to better integrate these services into existing toolkits can support broad community use.
more » « less
Full Text Available
Unified and pluralistic ideals for data sharing and reuse in biodiversity

https://doi.org/10.1093/database/baad048

Sterner, Beckett; Elliott, Steve; Gilbert, Edward E.; Franz, Nico M. (July 2023, Database)

Abstract How should billions of species observations worldwide be shared and made reusable? Many biodiversity scientists assume the ideal solution is to standardize all datasets according to a single, universal classification and aggregate them into a centralized, global repository. This ideal has known practical and theoretical limitations, however, which justifies investigating alternatives. To support better community deliberation and normative evaluation, we develop a novel conceptual framework showing how different organizational models, regulative ideals and heuristic strategies are combined to form shared infrastructures supporting data reuse. The framework is anchored in a general definition of data pooling as an activity of making a taxonomically standardized body of information available for community reuse via digital infrastructure. We describe and illustrate unified and pluralistic ideals for biodiversity data pooling and show how communities may advance toward these ideals using different heuristic strategies. We present evidence for the strengths and limitations of the unification and pluralistic ideals based on systemic relationships of power, responsibility and benefit they establish among stakeholders, and we conclude the pluralistic ideal is better suited for biodiversity data.
more » « less
Decentralized but Globally Coordinated Biodiversity Data

https://doi.org/10.3389/fdata.2020.519133

Sterner, Beckett W.; Gilbert, Edward E.; Franz, Nico M. (October 2020, Frontiers in Big Data)
null (Ed.)
Full Text Available
A data management workflow of biodiversity data from the field to data users

https://doi.org/10.1002/aps3.11310

Hackett, Rachel A.; Belitz, Michael W.; Gilbert, Edward E.; Monfils, Anna K. (December 2019, Applications in Plant Sciences)

Full Text Available
Towards a dynamic checklist of lichen-forming, lichenicolous and allied fungi of Ecuador – using the Consortium of Lichen Herbaria to manage fungal biodiversity in a megadiverse country

https://doi.org/10.1017/S0024282923000476

Yánez-Ayabaca, Alba; Benítez, Ángel; Molina, Rosa Batallas; Naranjo, Domenica; Etayo, Javier; Prieto, María; Cevallos, Gabriela; Caicedo, Erika; Scharnagl, Klara; McNerlin, Britton; et al (September 2023, The Lichenologist)

Abstract A checklist ofLichen-forming, Lichenicolous and Allied Fungi of Ecuadoris presented with a total of 2599 species, of which 39 are reported for the first time from the country. The names of three species,Hypotrachyna montufariensis,H. subpartitaandSticta hypoglabra, previously not validly published, are validated.Pertusaria oahuensis, originally introduced by Magnusson as ‘ad interim’, is validated asLepra oahuensis. The formLeucodermia leucomelosf.albociliatais validated. Two new combinations,Fissurina tectigeraandF. timida, are made, andPhyscia mobergiiis introduced as a replacement name for the illegitimateP. lobulataMoberg non (Flörke) Arnold. In an initial step, the checklist was compiled by reviewing literature records of Ecuadorian lichen biota spanning from the late 19th century to the present day. Subsequently, records were added based on vouchers from 56 collections participating in theConsortium of Lichen Herbaria, a Symbiota-based biodiversity platform with particular focus on, but not exclusive to, North and South America. Symbiota provides sophisticated tools to manage biodiversity data, such as occurrence records, a taxonomic thesaurus, and checklists. The thesaurus keeps track of frequently changing names, distinguishing taxa currently accepted from ones considered synonyms. The software also provides tools to create and manage checklists, with an emphasis on selecting vouchers based on occurrence records that can be verified for identification accuracy. Advantages and limitations of creating checklists in Symbiota versus traditional ways of compiling these lists are discussed. Traditional checklists are well suited to document current knowledge as a ‘snapshot in time’. They are important baselines, frequently used by ecologists and conservation scientists as an established naming convention for citing species reported from a country. Compiling these lists, however, requires an immense effort, only to inadequately address the dynamic nature of scientific discovery. Traditional checklists are thus quickly out of date, particularly in groups with rapidly changing taxonomy, such as lichenized fungi. Especially in megadiverse countries, where new species and new occurrences continue to be discovered, traditional checklists are not easily updated; these lists necessarily fall short of efficiently managing immense data sets, and they rely primarily on secondary evidence (i.e. literature records rather than specimens). Ideally, best practices make use of dynamic database platforms such as Symbiota to assess occurrence records based both on literature citations and voucher specimens. Using modern data management tools comes with a learning curve. Systems like Symbiota are not necessarily intuitive and their functionality can still be improved, especially when handling literature records. However, online biodiversity data platforms have much potential in more efficiently managing and assessing large biodiversity data sets, particularly when investigating the lichen biota of megadiverse countries such as Ecuador.
more » « less
Full Text Available
Distributed, but Global in Reach: Outline of a de-centralized paradigm for biodiversity data intelligence

https://doi.org/10.3897/biss.3.37749

Franz, Nico; Gilbert, Edward; Sterner, Beckett (July 2019, Biodiversity Information Science and Standards)

We provide an overview and update on initiatives and approaches to add taxonomic data intelligence to distributed biodiversity knowledge networks. "Taxonomic intelligence" for biodiversity data is defined here as the ability to identify and renconcile source-contextualized taxonomic name-to-meaning relationships (Remsen 2016). We review the scientific opportunities, as well as information-technological and socio-economic pathways - both existing and envisioned - to embed de-centralized taxonomic data intelligence into the biodiversity data publication and knowledge intedgration processes. We predict that the success of this project will ultimately rest on our ability to up-value the roles and recognition of systematic expertise and experts in large, aggregated data environments. We will argue that these environments will need to adhere to criteria for responsible data science and interests of coherent communities of practice (Wenger 2000, Stoyanovich et al. 2017). This means allowing for fair, accountable, and transparent representation and propagation of evolving systematic knowledge and enduring or newly apparent conflict in systematic perspective (Sterner and Franz 2017, Franz and Sterner 2018, Sterner et al. 2019). We will demonstrate in principle and through concrete use cases, how to de-centralize systematic knowledge while maintaining alignments between congruent or concflicting taxonomic concept labels (Franz et al. 2016a, Franz et al. 2016b, Franz et al. 2019). The suggested approach uses custom-configured logic representation and reasoning methods, based on the Region Connection Calculus (RCC-5) alignment language. The approach offers syntactic consistency and semantic applicability or scalability across a wide range of biodiversity data products, ranging from occurrence records to phylogenomic trees. We will also illustrate how this kind of taxonomic data intelligence can be captured and propagated through existing or envisioned metadata conventions and standards (e.g., Senderov et al. 2018). Having established an intellectual opportunity, as well as a technical solution pathway, we turn to the issue of developing an implementation and adoption strategy. Which biodiversity data environments are currently the most taxonomically intelligent, and why? How is this level of taxonomic data intelligence created, maintained, and propagated outward? How are taxonomic data intelligence services motivated or incentivized, both at the level of individuals and organizations? Which "concerned entities" within the greater biodiversity data publication enterprise are best positioned to promote such services? Are the most valuable lessons for biodiversity data science "hidden" in successful social media applications? What are good, feasible, incremental steps towards improving taxonomic data intelligence for a diversity of data publishers?
more » « less
Full Text Available
Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization

https://doi.org/10.3897/biss.5.74037

Seltmann, Katja; Allen, Julie; Brown, Brian; Carper, Adrian; Engel, Michael; Franz, Nico; Gilbert, Edward; Grinter, Chris; Gonzalez, Victor; Horsley, Pam; et al (September 2021, Biodiversity Information Science and Standards)

While bees are critical to sustaining a large proportion of global food production, as well as pollinating both wild and cultivated plants, they are decreasing in both numbers and diversity. Our understanding of the factors driving these declines is limited, in part, because we lack sufficient data on the distribution of bee species to predict changes in their geographic range under climate change scenarios. Additionally lacking is adequate data on the behavioral and anatomical traits that may make bees either vulnerable or resilient to human-induced environmental changes, such as habitat loss and climate change. Fortunately, a wealth of associated attributes can be extracted from the specimens deposited in natural history collections for over 100 years. Extending Anthophila Research Through Image and Trait Digitization (Big-Bee) is a newly funded US National Science Foundation Advancing Digitization of Biodiversity Collections project. Over the course of three years, we will create over one million high-resolution 2D and 3D images of bee specimens (Fig. 1), representing over 5,000 worldwide bee species, including most of the major pollinating species. We will also develop tools to measure bee traits from images and generate comprehensive bee trait and image datasets to measure changes through time. The Big-Bee network of participating institutions includes 13 US institutions (Fig. 2) and partnerships with US government agencies. We will develop novel mechanisms for sharing image datasets and datasets of bee traits that will be available through an open, Symbiota-Light (Gilbert et al. 2020) data portal called the Bee Library. In addition, biotic interaction and species association data will be shared via Global Biotic Interactions (Poelen et al. 2014). The Big-Bee project will engage the public in research through community science via crowdsourcing trait measurements and data transcription from images using Notes from Nature (Hill et al. 2012). Training and professional development for natural history collection staff, researchers, and university students in data science will be provided through the creation and implementation of workshops focusing on bee traits and species identification. We are also planning a short, artistic college radio segment called "the Buzz" to get people excited about bees, biodiversity, and the wonders of our natural world.
more » « less
Full Text Available
The Amphibian Genomics Consortium: advancing genomic and genetic resources for amphibian research and conservation

https://doi.org/10.1186/s12864-024-10899-7

Kosch, Tiffany A; Torres-Sánchez, María; Liedtke, H Christoph; Summers, Kyle; Yun, Maximina H; Crawford, Andrew J; Maddock, Simon T; Ahammed, Md Sabbir; Araújo, Victor_L N; Bertola, Lorenzo V; et al (December 2024, BMC Genomics)

Full Text Available
THE CALIFORNIA PHENOLOGY COLLECTIONS NETWORK: USING DIGITAL IMAGES TO INVESTIGATE PHENOLOGICAL CHANGE IN A BIODIVERSITY HOTSPOT

https://doi.org/10.3120/0024-9637-66.4.130

Yost, Jenn M.; Pearson, Katelin D.; Alexander, Jason; Gilbert, Edward; Hains, Layla Aerne; Barry, Teri; Bencie, Robin; Bowler, Peter; Carter, Benjamin; Crowe, Rebecca E.; et al (January 2020, Madroño)

Full Text Available

Search for: All records